[C10] An Action Recognition Algorithm Based on Two-Stream Deep Learning for Metaverse Applications
Published in 2024 International Wireless Communications and Mobile Computing (IWCMC), 2024
Action recognition algorithms have gained significant attention in recent years, which can be indispensable for a plethora of cutting-edge applications like extended reality or Metaverse. These services often pose stringent requirement on immediate sensing and cognition of the surroundings, which necessitates immediate classifications of the captured actions (e.g., video data) that classical signal processing methods can hardly attain. In this paper, we introduced a residual artificial neural network with two-stream structure to further improve the accuracy of action recognition algorithm. Specifically, two residual networks (ResNet101) are trained separately, one by spatial RGB image streams, and another by optical flow streams. The two-strem network outputs are then fed into a fusion classifier, in which information extracted by spatial network and temporal network jointly determines the classification result. Moreover, in the training process, hyper-parameters setting and optimizer selection are performed numerically to achieve optimal performance. Finally, the recognition accuracy of the proposed algorithm has been compared to other existing widely-employed counterparts, where UCF101 data set is utilized for training and testing. Simulations validates aiming that the network can achieve higher recognition accuracy.
Recommended citation: J. Liu, T. Mao, Y. Huang and D. He, "An Action Recognition Algorithm Based on Two-Stream Deep Learning for Metaverse Applications," in Proc. 2024 International Wireless Communications and Mobile Computing (IWCMC), Ayia Napa, Cyprus, 2024, pp. 639-642.
Download Paper
